Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to control the concurrent execution of aggregations #96023

Merged
merged 3 commits into from
May 15, 2023

Conversation

iverase
Copy link
Contributor

@iverase iverase commented May 11, 2023

Initial testing has shown that some aggregations are problematic when running on concurrency mode. In order to be able to run aggregations with concurrency, this PR proposes to add a flag that can tell if an aggregation block can be run concurrently. In addition it updates the AggregationTestCase to randomly run concurrently. This test has been used to uncover the problematic aggregations.

The problematic aggregations fall into two groups:

  1. The first group fails with error similar to:
java.lang.AssertionError: Sorted doc values are only supposed to be consumed in the thread in which they have been acquired. But was acquired in Thread[#52,test[T#1],5,TGRP-CardinalityAggregatorTests] and consumed in Thread[#36,TEST-CardinalityAggregatorTests.testIndexedAllDifferentValues-seed#[CAF09AA8E1D11977],5,TGRP-CardinalityAggregatorTests].
	at __randomizedtesting.SeedInfo.seed([CAF09AA8E1D11977:1F5E1886B5DDDAC8]:0)
	at org.apache.lucene.tests.index.AssertingLeafReader.assertThread(AssertingLeafReader.java:67)
	at org.apache.lucene.tests.index.AssertingLeafReader$AssertingSortedDocValues.getValueCount(AssertingLeafReader.java:917)
	at org.apache.lucene.index.SortedDocValuesTermsEnum.seekExact(SortedDocValuesTermsEnum.java:68)

The issue with this aggregations is that they have a post-collection phase that uses some generated data structures during the collection phase. Because this are run on different threads, we hit a lucene assertion. These aggregations are Cardinality, nested and Composite aggregations.

  1. The second group fails because the result of the aggregation might be different between concurrent and non-concurrent mode. This is actually expected because now each concurrent slice uses its own aggregator, so aggregation that might loose information before reduction might give different results. These aggregations are Terms and MultiTerms, the erros will depends on the shard size which now is applied to each concurrent slice.

Label the PR as non-issue as we don't support concurrent searches.

@iverase iverase requested review from martijnvg and javanna May 11, 2023 10:42
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 11, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of questions, looks good!

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@iverase iverase merged commit f1c94af into elastic:main May 15, 2023
@iverase iverase deleted the supportsConcurrentExecution branch May 15, 2023 06:41
javanna added a commit to javanna/elasticsearch that referenced this pull request Aug 8, 2023
In elastic#98204 we are introducing unconditional offloading of collection to a
separate thread pool, even for requests of phases that don't enable
search concurrency. It turns out that some aggs don't support offloading
their collection to a separate thread, as their postCollect method is
executed on the search thread which trips a lucene assertion around
reusing data structures pulled from the search worker thread.

With this commit we allow aggs to specify when they don't support
offloading their sequential collection. Such aggs are a subset of the
ones that already declare that they don't support concurrency entirely.

Relates to elastic#96023
javanna added a commit that referenced this pull request Aug 8, 2023
In #98204 we are introducing unconditional offloading of collection to a
separate thread pool, even for requests of phases that don't enable
search concurrency. It turns out that some aggs don't support offloading
their collection to a separate thread, as their postCollect method is
executed on the search thread which trips a lucene assertion around
reusing data structures pulled from the search worker thread.

With this commit we allow aggs to specify when they don't support
offloading their sequential collection. Such aggs are a subset of the
ones that already declare that they don't support concurrency entirely.

Relates to #96023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants